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From research to research synthesis in CALL 


Alex Boulton' 


Abstract. Any research study can only be fully appreciated once it is situated in 
relation to existing work. This is no mean feat, however, given the sheer quantity and 
variety of publications to date. Simply relying on one’s background and experience 
as an expert in the field, coupled with a few internet searches and following up 
individual references, is likely to lead to a very partial view. This paper argues the 
need for greater rigour (via meta-analytic and other types of syntheses) to gain a 
broader, deeper and more balanced understanding of Computer- Assisted Language 
Learning (CALL). 
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1. Conducting research 

The ‘scientific method’ has been developed in an attempt to reduce human 
fallibility in exploring the world around us. However, different researchers clearly 
go about their work in vastly disparate ways even within a single clearly-defined 
discipline such as CALL. Many attempts have been made to describe the different 
practices, one of the most common distinctions being between quantitative and 
qualitative research. A number of surveys have found the former to be prevalent in 
international journals in applied linguistics (e.g. Richards, 2009), which may fuel a 
popular perception that it is more prestigious or even more ‘scientific’ in some way. 

However, there is disagreement about exactly what qualitative and quantitative 
methods are, and debate about whether there is a clear boundary between them. 
On the face of it, any set of data is open to some sort of quantification - and 
indeed needs to be, otherwise it is impossible to know what to make of discussions 
of a single example, blog extract, or interview response. Is it representative of a 
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more widespread phenomenon, or just an interesting but isolated case? Any data 
which can be counted but is not (or which stops at the level of raw numbers or 
percentages), properly invites scepticism from the reader. In the end, the take-home 
message of many qualitative papers is that the situation is complex (for which the 
sceptic may read ‘vague’ and ‘subjective’), but that the researchers have found 
at least some evidence pointing in the right direction (if they are to be believed). 
Similarly, any overtly quantitative data also needs interpretation for it to make 
any meaningful contribution. As with qualitative research, it is surprisingly easy to 
go through the motions and produce poor quantitative studies by simply grinding 
numbers through an esoteric statistical test chosen for mysterious reasons, leading 
to a ‘voila’ moment of p<.05. This is markedly unsatisfactory, and has the opposite 
defect of qualitative studies in being misleadingly simplistic. While each approach 
is thus easy to criticise on scientific grounds or personal/cultural preference, it 
seems likely that the most robust research will derive from truly mixed-methods 
designs. 

2. Reviewing research 

The scientific enterprise is incremental and no single study will definitively 
answer any given issue. The question then is how to gain an accurate overview 
of research to date where even a small field like CALL sees many hundreds of 
studies published every year, often with conflicting results. The sheer number of 
publications means it is always possible to find some evidence thaf justify almost 
anything (Hattie, 2009, p. 6); it is therefore essential to find ways to bring greater 
rigour to research synthesis. Considerable advances have been made in this 
direction since the publication of the seminal paper by Norris and Ortega in 2000. 
Today, research synthesis has become almost a field in ifs own righf, with a number 
of handbooks, recommendations by academic associations and scientific journals, 
and special issues of prestigious journals or collected volumes. Norris and Ortega’s 
(2010) TimeLine review in Language Teaching gives a glimpse of the wealth of 
work in the area. 

Most research synthesis begins with an extensive and principled trawl of the 
literature related to a clearly defined question, but then can branch in different 
directions, each with its advantages and disadvantages (see Plonsky, 2014 for 
an overview). The narrative synthesis represents a qualitative approach: it can 
incorporate any type of study and allows for interpretation and contextualisation 
by the synthesist, but thereby remains open to charges of subjectivity; and while 
the picture is carefully nuanced, the final impression may remain correspondingly 
vague and fuzzy (Han, 2015). Quantifafive approaches, on the other hand, attempt 
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to be more objective in their interpretation of the results, but by definition only 
cater for studies that provide appropriate quantitative data; like primary quantitative 
studies, meta-analyses tend to leave a single numerical value as the take-home 
message for the casual reader, which is simplistic and misleading, and does not do 
justice to the sub-analyses of moderator variables (for other types of synthesis in 
CALL, see Burston 2013, 2015; Felix 2005, 2008). 

3. Meta-analyses in CALL 

The principle phases consist in outlining the scope of the topic, collecting and 
selecting publications, coding and extracting the data for analysis, calculating 
effect sizes and interpreting them according to various moderator variables - all 
according to stringent, pre-determined criteria. Though many decisions need to be 
made, the main constant in most meta-analysis is the calculation of the effect size; 
most in applied linguistics use Cohen’s d. This basically compares the difference 
in means between the control and experimental groups (or pre- and post-tests), 
while taking into account the variance as given in the pooled standard deviations 
(Figure 1). Effect size is in many ways more revealing than the more common 
significance testing; it is recommended if not required by recent APA standards and 
journals such as Language Learning, while some researchers seem to think that 
/>-values are at best uninformative and at worst positively harmful, and should be 
systematically replaced by effect sizes (e.g. Plonsky, 2011). One major advantage 
of using a standard measure of effect size is that it enables direct comparison of 
different studies, which is not possible with /(-values or narrative syntheses. 

Figure 1. Formula for Cohen’s o? 

. M2-M1 

a = = 

ISDi^jhSp^ 


As a field, CALL is now sufficiently mature to have given rise to several meta- 
analyses, some of which are given in Table 1 ; k refers to the number of studies 
covered in the analysis, and d is the effect size itselfi. The value for d needs 
interpreting (just as do /(-values, which tend to be set arbitrarily at .05 or .01). For 
applied linguistics, Oswald and Plonsky (2010) find an average effect size of 0.7, 


2. Pre/post (within-groups) and control/experimental (between groups) designs are not distinguished in this short 
paper. 
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and suggest that 0.4 should be considered small, 1.0 large. The first thing to note 
from Table 1 is that there are no negative d-values in any of the meta-analyses. 
This is not surprising, since few primary studies set out to discredit an experimental 
treatment against a control group, and would not expect lower scores in a post-test 
following treatment. Second, most of the effect sizes are not particularly large, 
the unweighted mean being just 0.66, with the higher ones mainly derived from 
smaller samples. Overall, this suggests a medium strength effect of computer- 
assisted language learning as seen from many different perspectives over many 
dozens of primary studies involving thousands of learners using a wide variety of 
tools and techniques. Third, each arrives at a different value; a single meta-analysis 
does not provide a definitive picture of a field (compare Plonsky and Brown’s 
(2015) discussion of the differing results of 18 meta-analyses of feedback). 


Table 1 . Meta-analyses in CALL (partly based on Oswald & Plonsky, 20 1 0) 


study 

year 

source 

question 

k 

d 

Abraham 

2008 

Computer Assisted 
Language Learning 

computer-mediated glosses in 
vocabulary learning 

6 

1.40 

Abraham 

2008 

Computer Assisted 
Language Learning 

computer-mediated glosses in reading 
comprehension 

11 

0.73 

Chiu 

2013 

British Journal of 
Educational Technology 

computer-assisted second language 
vocabulary instruction 

16 

0.75 

Chiu et al. 

2012 

British Journal of 
Educational Technology 

digital game-based learning 

14 

0.53 

Cobb& 

Boulton 

2015 

Cambridge University 
Press 

data-driven learning 

21 

1.04 

Grgurovic 
et al. 

2013 

ReCALL 

CALL-based language learning 

65 

0.26 

Lin,H. 

2014 

Language Learning & 
Technology 

CMC and SLA 

59 

0.44 

Lin, H. 

2015 

ReCALL 

CMC in L2 oral proficiency 
development 

25 

0.40 

Lin, W.C. 
et al. 

2013 

Language Learning & 
Technology 

text-based SCMC on SLA 

19 

0.33 

Taylor 

2009 

CALICO Journal 

CALL-based versus paper-based 
glosses 

32 

0.49 

Yun 

2011 

Computer Assisted 
Language Learning 

hypertext glosses in vocabulary 
acquisition 

10 

0.37 

Zhao 

2003 

CALICO Journal 

overall effectiveness of uses of 
technology in language education 

9 

1.12 


Though as Grgurovic, Chapelle, and Shelley (2013) point out, it can be politically 
useful to be able to quantify the effects of CALL, attempting to account for 
all the research in a single figure is obviously hugely simplistic. In quantitative 
research, a major failing in primary studies is that the variation (even if reported 
in standard deviations) is ironed out in a single overall figure; by definifion, a 
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meta-analysis involves far more variation which is also ignored if we only take 
away a single figure. Fortunately, meta-analysts do not just provide a single 
overall figure for effect size, they also discuss and interpret it, and in particular 
conduct analysis of potential moderator variables precisely to see what factors 
may explain the variation between studies. While it is not possible to go into 
details here, the reader is strongly encouraged not to take away the simple notion 
that d^.66 for CALL (unless it is politically or strategically expedient to justify 
budgets or investment at a local level, unethical though that may be), but to 
consult the various meta-analyses listed to see how each explains the variation 
it uncovers among the primary studies, and to decide whether the variation 
between the meta-analyses themselves may be attributed to their specific design 
or research questions. It is also of course important to go to the relevant primary 
studies, but approaching them after consulting a meta-analysis may help to keep 
them in perspective. 

4. Conclusions 

Meta-analysis can be “an immensely valuable scholarly contribution that brings 
order to confusion, helps set a future research agenda, and at the same time gives 
the best evidence-based practical advice” (Gumming, 2012, p. 231), but has its 
limitations and should never be taken as the ultimate answer to a question. What 
is needed is always more research: more primary studies of different types (where 
syntheses can help identify areas in need of work), and more syntheses to help 
make sense of them - again, both qualitative and quantitative. 
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